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\ Likelihood  Ratios  for  Tine-Continuous 

Data  Models : The  White  Noise  Approach 

A.  V.  Balakrishnan 

System  Science  Department 
University  of  California 

Abstract  We  develop  a formula  for  likelihood  functionals  for  signals  in  additive 
noise  in  the  time— continuous  case  using  a white  noise  approach.  It  is  shown  that 
the  formula  differs  from  the  well-known  formula  in  the  Wiener  process  version  by  the 
appearance  of  an  additional  term  corresponding  to  the  conditional  mean  square 

j * . 

filtering  error. 

1.  Introduction.  In  much  of  engineering  literature  on  identif ica o_on.  (too 
voluminous  to  be  referred  to  individually.  Sea  the  several  volumes  of  proceedings 

of  IFAC  Symposia  on  System  Identification  and  Parameter  Estimation,  1970,  1973  and 
1976)  it  is  customary  to  consider  the  observed  data  as  sampled  periodically  in  time 

— even  when  the  basic  phenomena  are  modelled  by  tune-continuous  different- ial 
equations.  The  usual  ’hand-waving*  argument  is  then  made  that  the  ’limiting* 
continuous-time  case  is  no  more  than  a mathematical  detail;  and  that  anyhow  in 

digital  computer  processing,  conversion  to  sampled  data  is  a basic  step.  This  is 

indeed  true;  but  the  authors  almost  invariably  proceed  to  use  the  model: 

y = S + N 
n n 

where  {S  } is  the  information— bearing  tuna  senes  and  {N  } the  observation  noise 
n 11 

series,  and  (this  is  the  crucial  point)  take  Cn^}  as  a sequence  of  independent 
variables.  But  this  requires  that  the  sampling  rate  be  not  more  than  twice  the 
noise  bandwidth,  itself  unknown.  Of  course,  to  answer  this  objection,  one  can  allcw 

t 

(N  } to  be  correlated;  but  then  the  correlation  function  must  be  known.  Now  it  is  ver 


unrealistic  to  require  the  correlation  function  of  instrument  noise ; and  even  when 

known,  .it  adds  a lot  to  the  complication,  but  little  to  trie  performance . We  Main- 
tain that  it  is  much  better  to  use  a time-continuous  model 

y(t)  = S(t)  + NCt)  (2) 

and  alla.-r  the  sampling  rate  to  be  as  high  as  the  A-D  converter  wants  to  use.  But 
in  the  time-continuous  model  we  are  faced  with  another  problem;  the  basic  tool  in 
identification  is  the  likelihood  ratio  (for  fixed  parameters)  : the  Radon-Nikodym 

. derivative  of  the  probability  measure  induced  by  the  process  y(  * ) to  that  induced 
by  N ( • ) . But  this  likelihood  ratio  is  difficult  to  implement  even  when  the  spectrum 
of  N(-)  is  known,  which  it  is  not.  What  we  can  say  for  sure  is  that  the  bandwidth 
of  the  (instrument)  noise  is  large  compared  with  that  of  the  process  S(*).  At  this 
point  the  earlier  engineering  literature  used  the  notion  of  "white  noise”  a process 
with  constant  spectral  density  over  all  frequencies  in  a formal  way . In  the  sixties 
it  became  fashionable  to  replace  this  by  the  "Wiener  process"  model  as  "more 
rigorous".  Thus  we  replace  (2)  by 


Y(t)  = f S(cf)  da  + WCt) 

J0 

where  W(t)  is  a Wiener  process.  Vie  have  then,  to  be  sure,  the  advantage  of  the 
powerful  machinery  of  Martingales  and  Ito  integrals.  In  fact  the  likelihood 
functional  (for  the  case  where  signal  and  noise  are  independent  which  we  assume 
thruout)  can  then  be  written  down  as:  fsee  Cl]]: 

T rT 

Exp  - 1/2^  ||S(t)|  |2dt  - 2 [S(t),  dY(t)]j 


(3) 


(4) 


where  S(t)  is  the  best  mean  square  estimate  cf  S(t)  given  the  observation  Y(s)  upto 
time  t.  But  the  hooker  is  that  the  second  tern  is  an  Ito  integral: 


X 


T 

CS(t),  dY(t)] 


This  integral,  is  defined  on  the  basis  that  Y(t)  is  of  unbounded  variation  with 
probability  one.  On  the  other  hand  no  physical  instrument  can  produce  such  a wave 
form!  Indeed,  we  must  now  go  back  to  (2)  where  it  caira  from  and  thus  replace 

dY(t)  by  yCt)dt 


This  is  all  right  if  S(t)  is  deterministic ; if  not,  we  no  longer  obtain  the  value 
prescribed  by  the  Ito  formula!  In  particular,  any  algorithm  based  on  it  leads  to 
erroneous  results.  Most  authors  of  papers  on  the  subject  probably  have  never 
bothered  to  calculate  anything  based  on  actual  data;  and  of  course  in  any  digital 

computer  simulation  it  is  possible  to  mask  this  completely.  Indeed,  almost  all 
simulation  models  employ  the  discrete  version  (1). 

. . Faced  with  this  difficulty  we  have  to  examine  more  precisely  the  model  again. 

« _ 

Thus  what  we  want  to  exploit  is  the  fact  that  the  bandwidth  of  the  noise  is  large 


compared  to  that  of  the  process  SC*).  Hence  what  is  really  needed  is  he 
’asymptotic  form’  of  the  likelihood  functional  as  the  bandwidth  goes  to  infinity 

in  an  arbitrary  manner. 

Such  a theory  has  been  developed  by  the  author  using  a precise  notion  of 
white  noise.  See  [ 2 j for  details.  Vfe  take  the  ’sample  points’  to  be  in  a Hilbert 

Space  with  Gauss  measure  theorem.  Thus  in  (2)  we  consider  N(t)  0 < t < T as  path- 


wise  square  ir.tegrable  in  [0,T];  as  elements  in  the  -space  L^CR^i  CO,T)),  (the 
observation  having  its  range  in  n-dimens ional  Euclidean  Space).  Corresponding 

to  white  noise  with  ’unit’  spectral  density,  we  define  the  Gauss  measure  by: 

iJ  CN(t),  h(t)Jdt  . T 

ECe  */°  ] = Exp  - 1/2  f Ch(t),  h(t)  jdt 


for  each  h(*)  in  L^CR^;  (0,T)3,  defining  thus  the  characteristic  function  of  the 


Gauss  measure. 


The  difference  between  this  set-up  and  the  Wiener-process  set-up  is  simply 
this.  Let  denote  a complete  orthomonral  system  in  L^CR^,  (0,T)3.  Then 

fr  • 

J0  Un(t),  H(t)]dt  = cn 

yield  a sequence  of  zero-mean,  unit  variance  Gaussians.  Tne  sample-space  for  the 
sequence  is  Z„ , since 


00  2 r 1 7 * 

1 C = / N(t)  dt  < ® 

1 n J 0 ' 

03 

On  the  other  hand,  given  such  a sequence  it  is  standard  practice  to  take  R as  the 

sample  space  and  via.  the  Kolmogorov  theory,  construct  a countably  additive  measure 
on  the  Borel  sets  of  R . [This  is  also  the  countably  additive  extension  to  Nuclear 

! . t 

Spaces  via  the  Minlos  theorem}.  This  is  in  fact  the  Wiener  process  theory,  in  which 
of  course,  all  of  Z^  has  zero  measure.  Both  set  ups  of  course  agree  on  the  treasures 

of  cylinder  sets.  What  is  rendered  difficult  by  using  Z^  as  the  sample  space  is  the 

_ 00 
notion  of  a random  variable.  Whereas  this  is  trivial  in  the  R model  — any  Borei- 

maas arable  function  being  a random  variable  — it  is  the  central  issue  in  the  Z^  - 

set-up.  In  other  words,  given  any  functional  f(*)  on  l^[R  ; (0,T)],  even 
continuous  thereon,  it  need  not  define  a random  variable.  We  define  it  as  a random 

variable  if  and  only  if  for  any  sequence  of  finite  dimensional  projections 
converging  strongly  to  the  identity,  the  sequence  {fCP^C • ) } is  Cauchy  in 
probability,  and  all  such  sequences  are  equivalent.  Thus  we  have  a smaller  class 

of  random  variables;  the  implication  being  that  the  Ito  integrals  in  the  Wiener 
process  theory  may  not  correspond  to  random-variables  on  Z^.  Moreover  the  ’limiting1 

notion  corresponds  to  the  ’bandwidth  expanding’  notion. 


2..  Likelihood  Ratio:  White  Noise  Theory*. 

Let  us  now  examine  likelihood  ratios  (Radon  Nikodym  derivatives)  in  terras  of 
the  v/hite  noise  theory.  Let 

-.y(t)  = S(t)  + N(t)  0<t<T<“  (2.1) 

where  S(-)  and  N( • ) are  independent  processes.  We  shall  assume  that  the  signal 
S(-)  has  finite  energy: 


i * 
»r 


/ o 


T E(||s(t)||2)dt  < » 


(2.2) 


For  each  t,  0 < t <_  T,  let 

W(t)  = I^tR  j (0,t)3 

We  shall  shorten  W(T)  to  simply  W.  Under  condition  (2.2),  the  process  S(*)  induces 
a countably  additivie  measure  on  W (and  hence  on  W(t)  for  each  t)  . _ CThe  cylinder 
measure  on  W can  be  extended  to  be  countably  additive,  in  other  words;  this  is  a 
consequence  of  the  Sazonov  theorem,  see  [ 2]].  Thus  (2.1)  defines  a weak 
distribution  on  W defined  by  the  characteristic  function: 


ECeiCy»h:,3  = C (h)  Exp  - 1/2  1 |h{  |2 


(2.3) 


where 


V-’  ' C (h)  = ECeiCS’h]] 

\ S' 

where  we  have  used  the  inner-product  notation: 


(2.4) 


[S,h]  = f CS(t),  h(t)]dt,  h £ W. 

J0  . • 

Then  the  cylinder  measure  induced  by  y(  • ) is  absolutely  continuous  with  respect 

to  Gauss  measure  y_  and  the  Radon-N ikodyn  derivative  is  defined  by  the  function: 

G 


f(u)  = J Exp  - 1/2  {||s||2  - 2 [S ,<i)) } ( 


(2.5) 


Thus  for  any  cylinder  set  C, 


1.1(C)  = limit  f f(P  u) 

^ n > oi  -1’ 


where  is  any  sequence  of  finite  dimensional  projections  strongly  convergent  to 
the  identity.  This  result  has  been  proved  in  [ 3 3. 

Let  {4  } be  an  orthonorrnal  basis  in  V!  and  let  L denote  the  mapping  of  W into 

n 


l2: 


Let 


T 


Lx 


= a;  a^  = J~  Cx(cr),  ^(a)3da. 


LS  = ? -•  • . 

Let  denote  the  measure  induced  on  by  this  mapping.  Then  we  can  rev/rite  (2.5) 
in  the  form  • ** 


f(u) 


= f - 1/2  - 2 [?, 


Lw3}  du 


(2.6) 


It  must  be  emphasised  that  (2.6)  is  defined  for  every  element  o in  VJ.  Note  also  that 

(2.6)  can  be  defined  with  respect  to  any  orthonormal  system  {<^}. 

The  likelihood  functional  f(y)  where  y(*)  is  the  observation,  will  now  be 
expressed  in  a form  similar  to  (4).  For  this  purpose,  let  (2.6)  be  defined  with 

respect  to  the  orthonorrnal  system  {<{>  }.  For  each  t,  0 < t T,  define  the  operators 

\ V •» 


a (t),  mapping  W into  by: 

t 


A(t)x  = a;  an  = J C<|»n(cf),  x(cr)3dcr 


(2.7) 


Let 


R(t)  = A(t)‘  A(t)*.  ••••..  (2.8) 

1 * 

Then  the  Radon-Nikodym  derivative  of  the  measure  induced  by  the  process  y(*)  over 
C0,t3  with  respect  to  Gauss  measure  on  W(t)  is  given  by: 


f(t,w)  = f Exp  - 1/2  {[R(t)  c,?3  - 2 [$,  A(t)u]}  dy 
J D C 


(2.9) 


Note  that  (T)  = L.  Let  denote  the  projection  operator  corresponding  to  the 
first  n basis  functions  t^},  i = Than  we  define 

CCt)  = limE[c|A(t)  ?ny]  (2. 

n 

As  shown  in  [33,  we  have  (Bayes  Formula)  that 

Jl  £ Exp  - 1/2  {[R(t)C,C3  - 2 U,  A(t)y]}  dy? 

(t)  = ? (2. 

f Exp  - 1/2  {[R(t)C,C]  - 2 [?,  A(t)y]}  dy^ 

£ 

Note  that,  by  Schwartz  Inequality 


llc<t) | | < 


I I cl | 2 Exp  - 1/2  {[R(t)5,c3  - 2 [c,  A(t)y]}  dy? 
f Zxp  - 1/2  {[R(t)c,Cl  - 2 [£,  A(t)y]}  dy^ 


ft  ||C| |2  Exp  - 1/2  ||  R(t)c  - A(t)y| |2  dy^ 

f Exp  - 1/2  ||  R(t)c  - A(t)y| | 2 dy 
< c E[||d|2]  Exp  + 1/2  (||  A(t)y||  + k)2,  0 < c,  k < » (2.12) 

• \ 

It'  should  be  noted  that  such  an  estimate  is  not  available  in  the  Wiener  process 
version.  Moreover  we  shall  shew  that  (2.9)  is  actually  absolutely  continuous  in  t 
with  an  L^-derivative . Let  <J>(t)  be  infinitely  differentiable  with  compact  support 
in  (0,T).  Ihen 

rT 

J [f(t,u)  <t>'(t)]dt 

T 

= [ f [Exp  - 1/2  {[R(t)C,£]  - 2 U,  A(t)ai]}  <>r(t)dt]}  dy 

Jt  Jq 


tuCt)]j  ^Exp  - 1/2  {[RCt)c,C] 


- 2 [£,  A(t)u]}  $(t)dtj  dy^ 


(2.13) 


where  wre  note  that  both 

03  gj  • 

| | Ed. (t)t. | |2  and  [Z4.(t)£.,  u(t)] 

1 1 1 1 1 

» " 
are  in  [0,T3  for  each  £ in  Hence  the  derivative  is  (defined  a.e.  0 < t < T): 

£ HrllZ*.Ct)C.||2  + rz<j>.(t)t.,  w(t)3)  Exp  - 1/2  {[R(t)c,?3  - 2 [£,  A(t)o)3>  dy 

J 2 , 1 1 ■ • 1 X ...  v * 

we  shall  next  prove  that  - 

N A 

g^(t)  = r^otjqCt)  o<t<T  •••*>. 

converges  in  the  norm  of  V/.  But  this  is  i irradiate  from  the  fact  that,  analogous  to 

(2.12): 

llg^t)!!2  < EC||E<J)i(t)?i||23  Ex?  + 1/2  I[A(t)yI|2  a.e.  0<t<T 
Let 

A “ A - 

S(t)  = Z$.(t)e.(t) 

1 

and  eo 

f ||£^.(t)C.|r  E*P  - i/2  (CRCt)C,c3  - 2 u,  A(t)y3>  dy 

J l2  1 1 1 * 

E>p  - 1/2  {[R(t)c,c3  - 2 Cc,  A(t)y3>  dy^ 

Then  from  (2.13)  we  can  write:  '• 

—•  Log  f(t,  y)  = - 1/2  (||S(t)j|2  - 2 [S(t),  y(t)l  + | |S(t)  J |2  - | JSCt)  1 12} 
and  hence  finally,  for  the  log  likelihood  functional: 


Log  f(y) 


we  note  that  the  third  term  can  also  be  expressed  as 

limit  E[||S(t)  - S(t)||2  *(t)Pny]  (2.14) 

n->°“ 

The  formula  (2.13)  differs  from  the  V/iener  process  version  in  the  appearance  of  the 
third  term;  in  the  case  where  S(t)  is  Gaussian,  we  know  that  (2.14)  reduces  to 

EillS(t)  - S(t)}j2J  . . 

which  is  then  also  independent  of  the  observation  y(*);  see  [3],  Note  that  (2.14) 

can  be  large  in  the  case  where  the  noise  level  is  large.  Formula, (2. 13)  wTas  derxved 
in  [33  for  a seemingly  less  general  case  by  a different  method.  Finally  we  remark 
that  (2.13)  is  consistent  with  the  'circle  differential'  formalism  of  Ito  [4], 
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